Want Speed? Pass by Value.

This entry is part of a series, RValue References: Moving Forward»

Be honest: how does the following code make you feel?

std::vector<std::string> get_names();
…
std::vector<std::string> const names = get_names();

Frankly, even though I should know better, it makes me nervous. In principle, when get_names() returns, we have to copy a vector of strings. Then, we need to copy it again when we initialize names, and we need to destroy the first copy. If there are N strings in the vector, each copy could require as many as N+1 memory allocations and a whole slew of cache-unfriendly data accesses as the string contents are copied.

Rather than confront that sort of anxiety, I’ve often fallen back on pass-by-reference to avoid needless copies:

get_names(std::vector<std::string>& out_param );
…
std::vector<std::string> names;
get_names( names );

Unfortunately, this approach is far from ideal.

  • The code grew by 150%
  • We’ve had to drop const-ness because we’re mutating names.
  • As functional programmers like to remind us, mutation makes code more complex to reason about by undermining referential transparency and equational reasoning.
  • We no longer have strict value semantics1 for names.

But is it really necessary to mess up our code in this way to gain efficiency? Fortunately, the answer turns out to be no (and especially not if you are using C++0x). This article is the first in a series that explores rvalues and their impliciations for efficient value semantics in C++.

RValues

Rvalues are expressions that create anonymous temporary objects. The name rvalue refers to the fact that an rvalue expression of builtin type can only appear on the right-hand side of an assignment. Unlike lvalues, which, when non-const, can always be used on the left-hand-side of an assignment, rvalue expressions yield objects without any persistent identity to assign into.2

The important thing about anonymous temporaries for our purposes, though, is that they can only be used once in an expression. How could you possibly refer to such an object a second time? It doesn’t have a name (thus, “anonymous”); and after the full expression is evaluated, the object is destroyed (thus, “temporary”)!

Once you know you are copying from an rvalue, then, it should be possible to “steal” the expensive-to-copy resources from the source object and use them in the target object without anyone noticing. In this case that would mean transferring ownership of the source vector’s dynamically-allocated array of strings to the target vector. If we could somehow get the compiler to execute that “move” operation for us, it would be cheap–almost free–to initialize names from a vector returned by-value.

That would take care of the second expensive copy, but what about the first? When get_names returns, in principle, it has to copy the function’s return value from the inside of the function to the outside. Well, it turns out that return values have the same property as anonymous temporaries: they are about to be destroyed, and won’t be used again. So, we could eliminate the first expensive copy in the same way, transferring the resources from the return value on the inside of the function to the anonymous temporary seen by the caller.

Copy Elision and the RVO

The reason I kept writing above that copies were made “in principle” is that the compiler is actually allowed to perform some optimizations based on the same principles we’ve just discussed. This class of optimizations is known formally as copy elision. For example, in the Return Value Optimization (RVO), the calling function allocates space for the return value on its stack, and passes the address of that memory to the callee. The callee can then construct a return value directly into that space, which eliminates the need to copy from inside to outside. The copy is simply elided, or “edited out,” by the compiler. So in code like the following, no copies are required:

std::vector<std::string> names = get_names();

Also, although the compiler is normally required to make a copy when a function parameter is passed by value (so modifications to the parameter inside the function can’t affect the caller), it is allowed to elide the copy, and simply use the source object itself, when the source is an rvalue.

1
2
3
4
5
6
7
8
9
10
11
12
std::vector<std::string> 
sorted(std::vector<std::string> names)
{
    std::sort(names);
    return names;
}
 
// names is an lvalue; a copy is required so we don't modify names
std::vector<std::string> sorted_names1 = sorted( names );
 
// get_names() is an rvalue expression; we can omit the copy!
std::vector<std::string> sorted_names2 = sorted( get_names() );

This is pretty remarkable. In principle, in line 12 above, the compiler can eliminate all the worrisome copies, making sorted_names2 the same object as the one created in get_names(). In practice, though, the principle won’t take us quite that far, as I’ll explain later.

Implications

Although copy elision is never required by the standard, recent versions of every compiler I’ve tested do perform these optimizations today. But even if you don’t feel comfortable returning heavyweight objects by value, copy elision should still change the way you write code.

Consider this cousin of our original sorted(…) function, which takes names by const reference and makes an explicit copy:

std::vector<std::string> 
sorted2(std::vector<std::string> const& names) // names passed by reference
{
    std::vector<std::string> r(names);        // and explicitly copied
    std::sort(r);
    return r;
}

Although sorted and sorted2 seem at first to be identical, there could be a huge performance difference if a compiler does copy elision. Even if the actual argument to sorted2 is an rvalue, the source of the copy, names, is an lvalue,3 so the copy can’t be optimized away. In a sense, copy elision is a victim of the separate compilation model: inside the body of sorted2, there’s no information about whether the actual argument to the function is an rvalue; outside, at the call site, there’s no indication that a copy of the argument will eventually be made.

That realization leads us directly to this guideline:

Guideline: Don’t copy your function arguments. Instead, pass them by value and let the compiler do the copying.

At worst, if your compiler doesn’t elide copies, performance will be no worse. At best, you’ll see an enormous performance boost.

One place you can apply this guideline immediately is in assignment operators. The canonical, easy-to-write, always-correct, strong-guarantee, copy-and-swap assignment operator is often seen written this way:

T& T::operator=(T const& x) // x is a reference to the source
{ 
    T tmp(x);          // copy construction of tmp does the hard work
    swap(*this, tmp);  // trade our resources for tmp's
    return *this;      // our (old) resources get destroyed with tmp 
}

but in light of copy elision, that formulation is glaringly inefficient! It’s now “obvious” that the correct way to write a copy-and-swap assignment is:

T& operator=(T x)    // x is a copy of the source; hard work already done
{
    swap(*this, x);  // trade our resources for x's
    return *this;    // our (old) resources get destroyed with x
}

Reality Bites

Of course, lunch is never really free, so I have a couple of caveats.

First, when you pass parameters by reference and copy in the function body, the copy constructor is called from one central location. However, when you pass parameters by value, the compiler generates calls to the copy constructor at the site of each call where lvalue arguments are passed. If the function will be called from many places and code size or locality are serious considerations for your application, it could have a real effect.

On the other hand, it’s easy to build a wrapper function that localizes the copy:

std::vector<std::string> 
sorted3(std::vector<std::string> const& names)
{
    // copy is generated once, at the site of this call
    return sorted(names);
}

Since the converse doesn’t hold—you can’t get back a lost opportunity for copy elision by wrapping—I recommend you start by following the guideline, and make changes only as you find them to be necessary.

Second, I’ve yet to find a compiler that will elide the copy when a function parameter is returned, as in our implementation of sorted. When you think about how these elisions are done, it makes sense: without some form of inter-procedural optimization, the caller of sorted can’t know that the argument (and not some other object) will eventually be returned, so the compiler must allocate separate space on the stack for the argument and the return value.

If you need to return a function parameter, you can still get near-optimal performance by swapping into a default-constructed return value (provided default construction and swap are cheap, as they should be):

std::vector<std::string> 
sorted(std::vector<std::string> names)
{
    std::sort(names);
    std::vector<std::string> ret;
    swap(ret, names);
    return ret;
}

More To Come

Hopefully you now have the ammunition you need to stave off anxiety about passing and returning nontrivial objects by value. But we’re not done yet: now that we’ve covered rvalues, copy elision, and the RVO, we have all the background we need to attack move semantics, rvalue references, perfect forwarding, and more as we continue this article series. See you soon!

Follow this link to the next installment.

Acknowledgements

Howard Hinnant is responsible for key insights that make this article series possible. Andrei Alexandrescu was posting on comp.lang.c++.moderated about how to leverage copy elision years before I took it seriously. Most of all, though, thanks in general to all readers and reviewers!


  1. Googling for a good definition of value semantics turned up nothing for me. Unless someone else can point to one (and maybe even if they can), we’ll be running an article on that topic—in which I promise you a definition—soon. 

  2. For a detailed treatment of rvalues and lvalues, please see this excellent article by Dan Saks 

  3. Except for enums and non-type template parameters, every value with a name is an lvalue. 

Posted Saturday, August 15th, 2009 under Value Semantics.

150 Responses to “Want Speed? Pass by Value.”

  1. AK says:

    Great article, but my experimentation with VS2012 shows that references are faster in some circumstances.

    I tested a small class that printed when objects of that class were Copy, Move or Default constructed. I then used the following two small tests, similar to your vector sorting functions.

    MyClass s;

    MyClass refRet = DoByRef(s); MyClass copyRet = DoByCopy(s);

    With the implementations being:

    MyClass DoByRef(const MyClass &s) { MyClass ret(s); ret.DoSomething(); return ret; }

    MyClass DoByCopy(MyClass s) { MyClass ret(s); ret.DoSomething(); return ret; }

    The results were that DoByRef does a total of one Copy construction, and DoByCopy does two. Now this could be a result of all the code being in one module, I’m not sure yet, but it does show that at times, passing references are considerably faster.

      Quote
    • AK says:

      OK looking a little more into it, if I construct ret with a std::move(s) instead, and pass an rvalue to DoByCopy, then I get a total of one move. If I pass an lvalue to it, I get a copy and a move (Slightly slower than just passing by reference which is just one copy). However this only makes passing by value useful if you’re passing in an rvalue. If you pass in an lvalue for some reason, you’re actually slower than just using a reference.

      So basically I’m not convinced (yet) that it’s worth the trouble to stop using pass by reference in real code.

        Quote
  2. David says:

    Any chance you could fix the broken link at the bottom of the article please?

    “Follow this link to the next installment.”

      Quote
  3. Vitali says:

    Does the same rule apply if your type is largish (let’s say 128 bytes)? To me, it seems like a pass-by-value would be pretty expensive since the swap or rvalue move will still effectively be a copy, thereby causing 2 copies of the data instead of 1. In the case where you are supplying an rvalue, you end up copying the data twice as well since the move into the local variable will be a copy as will the swap. Thus the pass-by-value case will always involve two copies.

    Thus to me it seems like for larger types you should still use const&. If the type can be moved more efficiently than copied, than APIs using it should provide an additional && API.

      Quote
  4. Noah Roberts says:

    I heard about this article when it was cited in Going Native 2013 where they were recommending to use value semantics by default.

    I can see why this improves performance when dealing with temporaries and/or things that can be moved. There’s a case I worry about though and I wonder what you have to say on the matter.

    My worry is that the assumptions are not documented by the signature. The assumption is that I’m going to gain speed because of copy ellision, but if I change the calling site such that that can no longer be done by instead passing an lvalue then I lose that performance. For example, I my find that I want to use the same object again so I store the temporary into a variable. Now I’m paying the cost of copies when a reference would have served the same purpose.

    That in and of itself doesn’t bother me, what bothers me is that this will happen without any warning. The semantic use of the function completely alters the performance, or am I missing something?

    What would you recommend to avoid this issue?

      Quote
  5. Absolutely Brilliant! Thanks! :D

    i will always pass by value for most situations.

      Quote
  6. Paul says:

    my brain hurts after reading this. Do things really need to be so complicated in C++?

      Quote
    • Not if you don’t care about performance. The price of performance is dealing with issues closer to the machine model. You can cover those issues up with “pretty” language abstractions like garbage collection… but then you have to give the performance back ;-)

        Quote
      • JR says:

        I sympathise with Paul’s lament.

        If anyone was producing a new high-performance language and they considered the use case of a user procedure to initialise a read-only vector of strings would they come up with such a pig’s ear of a solution as C++ has ended up with? C++’s excuse is C.

        Its got nothing to do with garbage collection.

          Quote
        • Marcel Kincaid says:

          You’re wrong. This has nothing to do with C, and it has everything to do with the fact that C++ doesn’t garbage collect. If you think otherwise, go ahead and try to design your high-performance language.

            Quote
      • Jakob says:

        In java, the above issue doesn’t exist since you have pointers for everything, which is pretty neat in my opinion. Then of course, you have garbage collection instead.

        I’ve been thinking if you couldn’t do the same with smart pointers, and still avoid having garbage collection. It’s just that smart pointers are so ugly in C++…

          Quote
  7. rhalbersma says:

    In a chess program, moves can be generated and stored in a std::vector. Most programs have split the move generation by piece type, and they pass a std::vector by reference (or pointer) to append the various parts together. Code would typically look like this:

    enum PieceTypes { King, Queen, Rook, Bishop, Knight, Pawn };
    class Position;
    class Move;
     
    // C++03 style move generation
     
    template<PieceTypes>
    void generate(const Position&, std::vector<Move>& moves);
     
    void generate(const Position& p, std::vector<Move>& moves)
    {
        generate<King>(p, moves);
        generate<Queen>(p, moves);
        generate<Rook>(p, moves);
        generate<Bishop>(p, moves);
        generate<Knight>(p, moves);
        generate<Pawn>(p, moves);
    }
     
    // call as:
    Position p;
    std::vector<Move> moves;
    generate(p, moves);

    Translating this to C++11 style with std::vector return-by-value runs into the small problem that the standard containers have no infix operators to append containers in the same way as one can do with std::string. Adding a template operator+ that also has a pass-by-value left argument will fix this.

    // C++11 style move generation
     
    // infix operator+ to do append on std::vector as with std::string
    // append is associative but not commutative
    template<typename T>
    std::vector<T> operator+(std::vector<T> lhs, const std::vector<T>& rhs)
    {
        lhs.insert(end(lhs), begin(rhs), end(rhs));
        return lhs;
    }
     
    template<PieceTypes>
    std::vector<Move> generate(const Position& p);
     
    std::vector<Move> generate(const Position& p);
    {
        return (
            generate<King>(p) +
            generate<Queen>(p) +
            generate<Rook>(p) + 
            generate<Bishop>(p) +
            generate<Knight>(p) +
            generate<Pawn>(p)
        );
    }
     
    // call as:
    Position p;
    auto moves = generate(p);

    Of course, the use of the operator+ can be debated here. For numeric applications one might also use operator+ to do element-by-element addition.

      Quote
    • rhablersma says:

      OK, I guess I should have asked a question to generate a reply. So here’s a question: how can I make the above code avoid all unnecessary copies? Reading from the later installments of this blog series, I figure I need 4 overloads of operator+. Here’s a first try:

      template<typename T>
      std::vector<T> operator+(const std::vector<T>& lhs, const std::vector<T>& rhs)
      {
              auto tmp = lhs;
              tmp.insert(tmp.end(), rhs.begin(), rhs.end());
              return tmp;
      }
       
      template<typename T>
      std::vector<T> operator+(const std::vector<T>& lhs, std::vector<T>&& rhs)
      {
              rhs.insert(rhs.begin(), lhs.begin(), lhs.end());
              return std::move(rhs);
      }
       
      template<typename T>
      std::vector<T> operator+(std::vector<T>&& lhs, const std::vector<T>& rhs)
      {
              lhs.insert(lhs.end(), rhs.begin(), rhs.end());
              return std::move(lhs);
      }
       
      template<typename T>
      std::vector<T> operator+(std::vector<T>&& lhs, std::vector<T>&& rhs)
      {
              lhs.insert(lhs.end(), rhs.begin(), rhs.end());
              return std::move(lhs);
      }

      It’s different from the Matrix example in this blog because operator+ is not commutative, and it’s also different to the std::string example in the Nxxx standard proposal documents because std::vector does not have a built-in operator+=. So another question is: do I also need to have 4 overloads of operator+= to let std::vector have full append functionality? What signature would they have to have?

        Quote
      • Howard Hinnant says:

        Your solution is fine as far as move semantics goes. A reasonable rewrite is to replace all 4 of your signatures with just:

        template<typename T>
        std::vector<T> operator+(std::vector<T> lhs, const std::vector<T>& rhs)
        {
                lhs.insert(lhs.end(), rhs.begin(), rhs.end());
                return lhs;
        }
        

        That being said, I would be tempted to just do the following:

        std::vector<Move>
        generate(const Position& p)
        {
            std::vector<Move> moves;
            generate<King>(p, moves);
            generate<Queen>(p, moves);
            generate<Rook>(p, moves);
            generate<Bishop>(p, moves);
            generate<Knight>(p, moves);
            generate<Pawn>(p, moves);
            return moves;
        }
        

        This isn’t quite as “cute” but is perfectly efficient. And this also comes with a caveat. If in the original code the client is calling this over and over as in:

        std::vector<Moves> moves;
        while (...)
        {
            // ...
            moves.clear();
            generate(p, moves);
            // ...
        }
        

        Then you might consider leaving your code as is. Count trips to the heap. Whatever minimizes that count is the best solution. Don’t throw away vector capacity to then just allocate it back. If you can reuse capacity, doing so is always a win. If moves is likely to hold capacity prior to the call to generate then attempt to take advantage of it.

          Quote
        • rhalbersma says:

          Hi Howard,

          Thanks for your comment. I make an upfront reservation for the the move vector’s capacity, so passing the pointer around would minimize the number of heap allocations:

                  std::vector<Move> generate(const Position& p)
                  {
                      std::vector<Move> moves;
                      moves.reserve(32);
           
                      generate<King>(p, moves);
                      .... 
           
                      return moves;
                  }

          One more question, though: would this still apply if I would use your stack allocator? So with

                  std::vector<Move, stack_alloc<Move, 32> > moves;

          wouldn’t that make the “+” notation more viable?

            Quote
    • Kos says:

      IMHO, as the functions are only concerned with “generating values”, instead of knowing about vectors they should just take an output_iterator as parameter and work with it.

      Then you’d just make a back_inserter to your vector, pass it around and you’re clear to go.

        Quote
  8. Howard Hinnant says:

    I think this article should be updated for C++11. There are two things wrong with it:

    1. It leaves the impression that one should always write your assignment operator like so:

      T& operator=(T x)    // x is a copy of the source; hard work already done
      {
          swap(*this, x);  // trade our resources for x's
          return *this;    // our (old) resources get destroyed with x
      }
      

      But in some important cases, this is a large performance penalty. Vector-like classes where heap memory can be reused during the copy assignment is a classic example. I’ve just written a short example showing as high as a 7X performance penalty.

    2. In C++11 the correct way to write sorted is:

      std::vector<std::string>
      sorted(std::vector<std::string> names)
      {
          std::sort(names.begin(), names.end());
          return names;
      }
      

      Implicit return-by-move from by-value parameters is now required.

    The basic point of the article is sound: Passing by value is an important tool in the tool box. But I’ve seen too many references to this article that mistakenly throw design and testing out the window on this issue, and translate this article into “always pass by value”.

      Quote
    • 1000% agreed

        Quote
      • In the case of C++11, wouldn’t it make sense to always use the by-value version of the assignment operator if a move constructor is provided in addition?

        T(T&& x)
        {
             // Steal resources from x.
        }
        
        T& operator=(T x)    // uses T(T&&) iff x is an r-value
        {
            swap(*this, x);
            return *this;
        }
        
          Quote
        • Howard Hinnant says:

          I think you just hammered my first point above home. Trying code:

          #include <cstddef>
          #include <new>
          #include <utility>
          
          template <class T>
          class MyVector
          {
              T* begin_;
              T* end_;
              T* capacity_;
          
          public:
              MyVector()
                  : begin_(nullptr),
                    end_(nullptr),
                    capacity_(nullptr)
                  {}
          
              ~MyVector()
              {
                  clear();
                  ::operator delete(begin_);
              }
          
              MyVector(std::size_t N, const T& t)
                  : MyVector()
              {
                  if (N > 0)
                  {
                      begin_ = end_ = static_cast<T*>(::operator new(N*sizeof(T)));
                      capacity_ = begin_ + N;
                      for (; N > 0; --N, ++end_)
                          ::new(end_) T(t);
                  }
              }
          
              MyVector(const MyVector& v)
                  : MyVector()
              {
                  std::size_t N = v.size();
                  if (N > 0)
                  {
                      begin_ = end_ = static_cast<T*>(::operator new(N*sizeof(T)));
                      capacity_ = begin_ + N;
                      for (std::size_t i = 0; i < N; ++i, ++end_)
                          ::new(end_) T(v[i]);
                  }
              }
          
              MyVector(MyVector&& v)
                  : begin_(v.begin_),
                    end_(v.end_),
                    capacity_(v.capacity_)
              {
                  v.begin_ = nullptr;
                  v.end_ = nullptr;
                  v.capacity_ = nullptr;
              }
          
          #ifndef USE_SWAP_ASSIGNMENT
          
              MyVector& operator=(const MyVector& v)
              {
                  if (this != &v)
                  {
                      std::size_t N = v.size();
                      if (capacity() < N)
                      {
                          clear();
                          ::operator delete(begin_);
                          begin_ = end_ = static_cast<T*>(::operator new(N*sizeof(T)));
                          capacity_ = begin_ + N;
                      }
                      T* p = begin_;
                      const T* q = v.begin_;
                      for (; p < end_ && q < v.end_; ++p, ++q)
                          *p = *q;
                      if (q < v.end_)
                      {
                          for (; q < v.end_; ++q, ++end_)
                              ::new(end_) T(*q);
                      }
                      else
                      {
                          while (end_ > p)
                          {
                              --end_;
                              end_->~T();
                          }
                      }
                  }
                  return *this;
              }
          
              MyVector& operator=(MyVector&& v)
              {
                  clear();
                  swap(v);
                  return *this;
              }
          
          #else
          
              MyVector& operator=(MyVector v)
              {
                  swap(v);
                  return *this;
              }
          
          #endif
          
              void clear()
              {
                  while (end_ > begin_)
                  {
                      --end_;
                      end_->~T();
                  }
              }
          
              std::size_t size() const
                  {return static_cast<std::size_t>(end_ - begin_);}
              std::size_t capacity() const
                  {return static_cast<std::size_t>(capacity_ - begin_);}
              const T& operator[](std::size_t i) const
                  {return begin_[i];}
              T& operator[](std::size_t i)
                  {return begin_[i];}
              void swap(MyVector& v)
              {
                  std::swap(begin_, v.begin_);
                  std::swap(end_, v.end_);
                  std::swap(capacity_, v.capacity_);
              }
          };
          
          template <class T>
          inline
          void
          swap(MyVector<T>& x, MyVector<T>& y)
          {
              x.swap(y);
          }
          
          #include <iostream>
          #include <string>
          #include <chrono>
          
          int main()
          {
              MyVector<std::string> v1(1000, "1234567890123456789012345678901234567890");
              MyVector<std::string> v2(1000, "1234567890123456789012345678901234567890123456789");
              typedef std::chrono::high_resolution_clock Clock;
              typedef std::chrono::duration<double, std::micro> US;
              auto t0 = Clock::now();
              v2 = v1;
              auto t1 = Clock::now();
              std::cout << US(t1-t0).count() << " microseconds\n";
          }
          
          $ clang++ -stdlib=libc++ -std=c++11 -O3 -DUSE_SWAP_ASSIGNMENT test.cpp
          $ a.out
          174.516 microseconds
          $ a.out
          180.83 microseconds
          $ a.out
          175.848 microseconds
          
          $ clang++ -stdlib=libc++ -std=c++11 -O3  test.cpp
          $ a.out
          26.339 microseconds
          $ a.out
          24.179 microseconds
          $ a.out
          24.103 microseconds
          
            Quote
          • Thanks for posting the example, that your point in the previous comment clear.

            Matthias
              Quote
          • Agnostic says:

            But what if exception is thrown while doing *p = *q; in “optimal” operator=. We have a vector with partially copyed items. Isn’t this a case when we are trading safety for speed?

            BTW: we have a bug in this vector for the case if (capacity() < N) the end_ pointer equals to begin_ and is not updated by += N below.

              Quote
          • coder says:

            I replicated this test using gcc-4.7.2 -std=c++11 -O3

            compiled with -DUSE_SWAP_ASSIGNMENT 47 microseconds 48 microseconds 48 microseconds 47 microseconds

            without -DUSE_SWAP_ASSIGNMENT 36 microseconds 36 microseconds 36 microseconds 36 microseconds

            That yields a .32X performance difference. Could not reproduce your 7x. But even .32X is very significant.

              Quote
        • Andrea says:

          Wow! Tried to understand what’s going on here… Am I right in assuming the difference is in the ::new(end_) T(v[i]) copy construction loop in cctor (used when -DUSE_SWAP_ASSIGNMENT) vs the *p = *q assignment loop when using “normal” MyVector& operator=(const MyVector &) ??? But then why such a difference? There’s a placement new up there (i.e. not real memory allocations, only string’s cctor invocation), how can it be that worse than an assignment? I’m sure I’m missing something… Also, testing on a mac with latest clang from trunk and gcc-4.7 built from sources, I can’t see that timing difference when compiling with g++ -std=c+11. I.e., both old and swap-based assignments time almost the same as clang’s best case. Is the different size of std::string (8 bytes in gnu’s libstdc++ vs 24 bytes in clang’s libc++) the cause of gcc’s insensitivity to the kind of assignment used?

          Andrea

            Quote
          • Howard Hinnant says:

            Think of it this way: the most efficient way to recycle something is to re-use it. The copy assignment operator can sometimes re-use memory, instead of deallocating it and then allocating more. That is what is happening in this example. One way deallocates memory just to turn around and allocate it back. The other way holds on to its memory and re-uses it for the new value. The optimization is to simply avoid calling new/delete as much as you can.

            I imagine the difference you’re seeing with gcc is that they are using a ref-counted string. Try the experiment again, but using MyVector<std::vector<int>> instead.

              Quote
          • Andrea says:

            Howard, thanks for your helpful reply. I think I’m getting hold of it now. Can you just confirm my understanding is right when I say that: When using vectors of object that have “external” resources (i.e. allocate memory on the heap as it is the case for strings), going through the route of invoking their constructor (as when doing ::new(end_) T(v[i]) in the copy constructor loop used when -DUSE_SWAP_ASSIGNMENT) makes you incur in the penalty of allocations even if those are placement news. Instead, the p=q assignments in the plain old assignment operator’s loop can re-use the already allocated memory on the destination (in particular when as in this case, the destination is a longer string) and make this approach more efficient. As you suggested, using vectors of int (or I think more generally PODs/aggregate objects) levels out the difference of the two approaches because that extra price during the placement new has not to be payed.

            Andrea

              Quote
          • Howard Hinnant says:

            That sounds right.

              Quote
  9. Andrzej Krzemieński says:

    Hi Dave, I tried to apply the idiom for copy assignment you describe, but I encountered one suspicious nuance when trying to specify exception specification for my copy assignment. I have a “gut feeling” that something is wrong, although I cannot clearly specify it. Here is my problem. Without passing by value I would specify my assigment like this:

    T& T::operator=( T const& x )
    { 
      T tmp(x);          // can throw
      swap(*this, tmp);  // no-throw (let's assume)
      return *this; 
    }

    This really says what I need to do to assign the value from one object represented by reference ‘x’. I need to make a copy first, and swap it with my value. Can this operation throw? Surely: a copy constructor is a typical place where one would expect a throw.

    Now, is the answer the same for the “pass-by-value” idiom?

    T& operator=(T x)    // this copying is not inside the assignment
    {
        swap(*this, x);  // no-throw
        return *this;  
    }

    Technically we are doing the same thing, but copying is somehow ejected outside of the function. The function only does a no-fail (let’s assume that) swap. So in fact, I can write:

    T& operator=(T x) noexcept
    {
        swap(*this, x);  // no-throw
        return *this;  
    }

    I am telling the truth: there is nothing in the copy assignment that could cause a throw. But anyone who tries to use the assignment may throw, beacuse our function even thouh it does not copy itself, forces you to copy T, even though you are not (or may be not) aware of it. That is, by declaring the function like this (with noexcept), while technically being correct, I confuse everyone by implying that using this assignment operator does not raise exceptions. I would be more honest if I wrote:

    T& operator=(T x) noexcept( std::is_nothrow_copy_constructible<T>::value )
    {
        swap(*this, x);  // no-throw
        return *this;  
    }

    But this also looks strange: why would I base the condition on the properties of the constructor that I never call?

      Quote
    • Hi Andrzej,

      I wish this were crisper, but I would say:

      • operator= is unconditionally noexcept
      • assignment from an lvalue is noexcept if T’s copy constructor is noexcept
      • assignment from an rvalue is noexcept if move-constructing a T is noexcept
        Quote
  10. litb says:

    “Except for enums, every value with a name is an lvalue.”. I know I’m going to annoy you by this, but I want to inform the innocent reader that integer, pointer and member pointer template parameters aren’t lvalues either.

      Quote
  11. someguy says:

    With sorted3, copy elision seems to be more complicated. As far as the function is concerned, the argument is an lvalue, so unless the sorted function knows that the argument passed to the sorted3 function is an rvalue, it can’t perform copy elision. Or have I misunderstood? If this is the case, then it must be capable of interprocedural optimization, right? Why can’t it elide when the function parameter is returned then?

      Quote
  12. Tony Van Eerd says:

    Note also the possible semantic difference between pass by value and pass by reference: For example:

    struct B { int i; };
    struct D : B { int j; };
    
    void by_ref(B const & b)
    {
    }
    void by_value(B b)
    {
    }
    

    by_ref() takes “anything thay isa B” ie anything derived from B, whereas by_value() takes a B, and B only.

    … or so it seems. Actually, assuming a B copy-constructor of the form B::B(B const &), then by_value(d) still works, via implicit by_value(B::B(d)).

    B‘s copy constructor would need to be explicit to avoid this.

    Not sure how big of a deal that is, but passing by value (while using explicit constructors) could prevent slicing in some situations.

    (In fact, in general, the “slicing” of a Derived when constructing a Base from a Derived might be surprising in some situations. ie I suspect many people don’t think of their copy-constructor being used as a slicer. Or being ‘polymorphic’ in some way. Interesting…)

      Quote
  13. Tal Agmon says:
    1. Except for enums, every value with a name is an lvalue Did you mean rvalue?
      Quote
  14. kaalus says:

    Forgive me the rant. The article is very good, and I really admire people like Dave Abrahams, who keep in touch with all this, despite the complexity.

      Quote
  15. kaalus says:

    It seems to me that C++ has got itself into such a blind alley.

    After reading the article, step back and have a look: such a basic thing, passing/returning values from functions. Yet it is so complex, full of traps. It takes many pages to explain, and it requires the reader to have many years of C++ experience to be really able to grasp the explanations, and apply them succesfully.

    Can anyone be expected to write reliable software, solving complex problems, if you need 10 years of experience to just reliably return a value from a function, without shooting yourself in the foot? But this article shows only a tip of the iceberg, really. What a complete horror this becomes when you factor in variadic templates, template specializations, overloading rules, overriding rules, lambda expressions, SFINAE rules, type promotions. With this on, you can pretty much never be sure your code is correct, let alone optimal.

    C++ has become clever, way too clever for an ordinary programmer to use effectively.

    It no longer looks good even on toy problems – too many caveats.

    As a result, average programmer uses C++ in a shoddy way, resulting in buggy, suboptimal code. This is what 95% C++ programmers in the wild are doing, from my experience.

    The remaining 5%, who have OCD and are really determined to do all things right without cutting corners, end up agonizing for hours on every trivial function definition, all in the spirit of the above article. Of course they get nowhere for weeks.

    C++ is a tool. Tools are for making people’s lives easier. C++ doesn’t anymore. It creates more problems than it solves.

      Quote
    • peterchen says:

      Your rant is definitely justified. My viewpoint differs in some aspects – so here’s mine:

      In defense of C++: Never underestimate “simple”. Once you look deeper into it, it often gets terribly complicated – or in other words, it’s amazing on what a tower of shoulders we stand. To blame the same on other languages: garbage collection is simple – unless you look into it. Who would have thought cleaning up objects would be so terribly complicated?

      Picking the wrong way to pass a parameter is rarely shooting yourself in the foot. going by the “simpler rules” (pass by reference to avoid copies) will virtually never be wrong. Even indiscriminately passing by value will be good enough most of the time.

      All code of given complexity is shoddy, incomplete and questionable. The quesiton is: is it good enough for its purpose? (Our biggest problem here: change of purpose.) Knowing what corners you can cut and which you cannot makes you a good programmer. The “obsessive about everything guy” you describe is not (and I recognize myself in your OCD description).

      Where I agree:

      You have discovered the conundrum of choice: we pick C++ because it gives us choice – in the context of the example, we can pass by value or reference or pointer, or we might sit behind a template parameter and even not know how we pass. However, that choice is also what makes C++ hard, twice the choices isn’t always twice as good.

      I am unhappy with the RValue references, because they increase the tax on a typical class without default copy implementaiton. OTOH, they do solve a problem of C++ while preserving the choices enabled by other features.

      Number of features in a language is a fundamental problem for languages. Adding lambdas and templates and exceptions and rvalue references to the language make my job easier, because I can use them where appropriate. It also makes my job harder, because to understand your code, I might have to learn lambdas whether I like them or not. (And whether I would have used them or not. I think this dispartiy is a great source of rewrites: Great functionality, but they use exceptions, we use error codes. Cool solution, but I really hate template meta programming. etc.)

      There is no universally perfect spot. C++ will see less use in large systems and desktop development. C++ will see more use in small microprocessor and embedded systems, because hardware and compilers are catching up just now. There’s still room for C++, and due to its variety, a lot of room.

      C++ is a toolbox, not a single tool.

        Quote
      • Robert Ramey says:

        This is a great comment. It well captures the ying/yang of C++ development.

        I think the solution is building applications in layers of abstraction. This can permit all/most of the issues related to low level optimization to be addressed in lower layers while upper layers and ignore most of this. This is why I’m a fan of C++. Unfortunately, I think a lot of programing shops miss and just start coding rather than building applications layer by layer. Using C++ in this way causes lots of frustration and confusion.

        C++ is a toolbox for making other toolboxes.

        Robert Ramey

          Quote
    • AK says:

      Yup. I’ve been doing C++ for many years but the companies I worked in used very old version of the compilers that haven’t supported C++11. We’ve only switched a short while ago. Having to read and digest articles like this on something so simple is rather nuts. I, as most of us will, work it out and start to get used to it. However such a common task becoming so complicated is just insane.

        Quote
  16. x.martian says:

    This article was copied verbatim by someone in his blog:

    prasanthmadhavan dot wordpress dot com @ /2010/11/26/the-r-value/

      Quote
  17. x.martian says:

    I’m not sure that I agree with you that

    std::vector
    sorted2(std::vector const& names) // names passed by reference
    {
        std::vector r(names); // and explicitly copied
        std::sort(r);
        return r;
    }
    

    be in general be less efficient, even with copy elision, than:

    std::vector
    sorted(std::vector names)
    {
        std::sort(names);
        return names;
    }
    

    It seems to be that in either cases, at least one copy must be made. With return value elision, sorted2 requires no more than one copy either.

    Did I miss anything?

      Quote
    • Yes. I’m not sure what, but you did.

      When the argument is an rvalue, the compiler is allowed to call the 2nd one with no copying. Now, the fact is that in practice, because of the way compilers implement function calls, it requires a copy today, but with move semantics no copy is needed. To avoid the copy today, you can do something like this:

      std::vector
      sorted(std::vector<std::string> names)
      {
          std::sort(names);
          std::vector<std::string> names_;
          swap(names,names_);
          return names_;
      }
        Quote
      • x.martian says:
        the compiler is allowed to call the 2nd one with no copying.

        I can’t see why that is possible. In your example of

        // get_names() is an rvalue expression; we can omit the copy!
        std::vector sorted_names2 = sorted( get_names() );
        

        The object returned by get_names() is a temp object which will be out of the scope when sorted returns. One may argue that in principle, with RVO, sorted() receives the return value object from the caller, which in turn can pass it to get_names(). But such a scheme seems to violate the standard which requires all argument be evaluated BEFORE entering the callee.

          Quote
        • The temporary’s lifetime lasts to the end of the full expression. That includes the assignment.

            Quote
          • x.martian says:
            The temporary’s lifetime lasts to the end of the full expression.That includes the assignment.   

            But that’s not good enough. you are going to use sorted_names2 after the statement, don’t you? You need to preserve the values by copying them to sorted_names2 right? My argument is that you’ll have to make at least one copy of the values, either by using copy constructor, or by assignment. Hence, there’s no advantage of using 1.

              Quote
          • Once you elide the copy, the source of the copy no longer gets destroyed; the lifetime becomes that of the thing it was “copied” into.

              Quote
          • x.martian says:
            Once you elide the copy, the source of the copy no longer gets destroyed; the lifetime becomes that of the thing it was “copied” into.   

            How would that be possible? The “source” of the copy is allocated in the stack and is popped out upon the return of sorted function.

              Quote
          • As noted in the article itself, the compiler allocates space for the return value outside the stack frame of the function doing the returning. It constructs the source there.

              Quote
      • peterchen says:

        To see if I understand your point correctly:

        You say that with a good compiler,

        vector getsomevector() { ... }
        vector x = sorted(getsomevector());

        there’s no copy for the sort, since the gestomevector temporary() is moved into a local, where the sort happens, which in turn is moved into x?

          Quote
        • No, I’m saying a good C++03 compiler will neither incur a copy to pass an rvalue into the function, nor will it, in most cases, incur a copy to pass a return value out of a function. There are a few cases where the realities of calling conventions make it impractical to suppress the copy upon return, and this is one of them (think about how the optimization must be implemented and you’ll see why), which explains the need for swap() in my example on real compilers, even good ones.

            Quote
          • x.martian says:

            So now you need to do a swap, which creates more chance for the compiler to goof up.

            Why not use this good old

            sorted_names sorted2(std::vector& names);

            signature. The only thing I need to pray for is a RVO.

              Quote
          • Why don’t you try that with an rvalue argument and find out?

              Quote
          • I understood the vector example where an implicit copy is better than explicit copy. But Im not sure what to do in this situation, mystring f(); const mystring s1 = f(); const mystring& s2 = f();

            If I want a const mystring which one of the above to use or both equivalent? I created a custom class and observed that both the statements makes equal number of copy constructor calls. Either 0 copy or 1 copy depending upon whether the return value is known at compile time or not. Many of my friends are say catching the return value by reference is efficient. After reading this article I feel that either the first one is better or both are equivalent. Am I right?

              Quote
      • x.martian says:
        Why don’t you try that with an rvalue argument and find out?   

        Yes, indeed. I tried the following with Microsoft VC++10:

        #include
        #include
        #include
        
        #define SIZE 1000
        
        class Big
        {
        public:
            Big() {
                std::cout << "Big constructor" << std::endl;
                //res = malloc(SIZE);
            }
        
            ~Big() {
                std::cout << "Big destructor" << std::endl;
                //free(res);
            }
        
            static Big Build() {
                Big big;
                memset(big.res, 0, SIZE);
                std::cout << "first " << &big << std::endl;
                return big;
            }
        
            static Big Revise(Big value) {
                std::cout << "second " << &value << std::endl;
                memset(value.res, 1, SIZE);
                return value;
            }
        
            static Big Revise1(Big const& ref) {
                Big value(ref);
                std::cout << "second " << &value << std::endl;
                memset(value.res, 1, SIZE);
                return value;
            }
        
        private:
            char res[SIZE];
        };
        
        
        int _tmain(int argc, _TCHAR* argv[])
        {
            Big b = Big::Revise(Big::Build());
            std::cout << "third " << &b << std::endl;
            std::cout << std::endl;
            Big c = Big::Revise1(Big::Build());
            std::cout << "third1 " << &c << std::endl;
            return 0;
        }
        

        And here is the result:

        Big constructor
        first 0022F5F0
        second 0022EA24
        Big destructor
        Big destructor
        third 0022F208
        
        Big constructor
        first 0022F9D8
        second 0022EE20
        Big destructor
        third1 0022EE20
        

        Clearly, passing by value reduced the number of copying of the object.

          Quote
        • James Hopkin says:

          I’m a little confused by your results and your comment. The output appears to show that the compiler managed to elide one of the copies only when passed by reference.

          For reference, here’s what I get running the same code on gcc4.5.1:

          Big constructor
          first 0xbf8cace0
          second 0xbf8cace0
          Big destructor
          third 0xbf8cb0c8
           
          Big constructor
          first 0xbf8ca510
          second 0xbf8ca8f8
          Big destructor
          third1 0xbf8ca8f8
          Big destructor
          Big destructor

          Here passing by value results in only one copy. I’m going to look at the optimisation settings to see if it can do better for either case.

            Quote
          • James Hopkin says:

            By the way, I tried adding a move constructor to Big. It is only called if I add an std::move to the return statement of Revise. It seems the compiler is failing to identify the return value of Revise as an opportunity to elide the copy (as allowed by 12.8/34 in the FCD), so neither does it implicitly treat the expression as an rvalue (as required by 12.8/35).

            Presumably the problem is with trying to elide the copies at both ends (parameter and return).

              Quote
          • FWIW, the compiler is supposed to implicitly wrap any by-value returns in std::move(…). If adding std::move(…) around your return value makes any difference, that’s either a (knowingly) partially-implmented feature or a bug.

              Quote
          • x.martian says:

            Actually, we also have to make sure the compiler does not inline any of functions.

              Quote
        • There’s already a fairly complete test referenced from this comment

            Quote
  18. Vincent says:

    I am curious about one thing: I tried the example you gave above, and slightly modified the nrvo and urvo tests as follows:

     
    X nrvo_source(bool b)
    {
        trace t("nrvo_source");
        if (b)
        {
          X a;
          return a;
        }
        else
        {
          X b ;
          return b ;
        }
    }
     
    X urvo_source(bool b)
    {
        trace t("urvo_source");
        if (b)
          return X() ;
        else
          return X() ;
    }

    Now g++ 4.5 still elides the copy in the urvo case, but not anymore in the nrvo case. It is not crystal clear to me why it is so…

      Quote
    • The caller allocates room on the stack for the return value. In urvo_source, it can construct a new object there, no matter which branch of the if is taken. Now, that happens to be the case for nrvo_source as well, but in general, if the objects are named, they may need to maintain an identity separate from whatever becomes the return value:

      X nrvo_source(bool (&bf)())
      {
          trace t("nrvo_source"); // trace
          X a, b;
          // use a and b here.
          return bf() ? a : b;
      }

      The most likely explanation is that when the compiler sees that two different named objects can be returned, it simply gives up on elision and assumes it needs to copy. That’s a cheap way to make the optimization in many cases without performing flow analysis.

      HTH,

        Quote
      • Vincent says:

        From where I stand (i.e. far away from compiler design…), it just seems that, in the case of nrvo_source(), the same kind of analysis could be performed in each branch of the if(), as in the whole function in your original example. In other words, it should be feasible for a compiler to realise that, in my example, it just needs to construct either a or b at the address passed by the caller.

        Of course, there is nothing it could do when the function is rewritten as you show, so maybe it’s not worth the trouble.

        Anyways, thanks a lot for your reply, and for this site in general: lot of good reading ahead :-)

        (I have no idea why my code was not properly highlighted: I did use the tilde fences, though)

          Quote
  19. prasoon says:

    Excellent Article. Read it twice :-) .

    Cheers!!

      Quote
  20. Marc says:

    Hello,

    I’ve been trying to use value semantics a lot recently. It is an interesting way of coding, but sadly copy elision (at least as it is currently implemented in compilers) is too restrictive in many cases. For instance if I have: struct Wrapper : Base { Wrapper(Base const& b):Base(b){} }; this constructor will always cause b to be copied (or moved, but for this post I am interested in types where move isn’t faster than copy), even when it comes from a temporary. Sometimes jumping through hoops with emplace-like constructors allows to work around this, but not always. Also, if I have a struct NonAggregate { std::array<Type,5> member; }; (it is not an aggregate because I will give it constructors), I can’t find a way to initialize member without copying all the elements (for an aggregate I could do Aggregate obj={{Type(),Type(),…}}; ).

    Now all of this would change if compilers analyzed their AST and, for every temporary object that is referenced only once and for which that reference is a copy, they collapsed this branch. Obviously I am missing some rather large details, but I believe something like that is necessary if we really want to move towards more value semantics.

      Quote
  21. Chubsdad says:

    May be this should be reworded like so. “Unlike lvalues, which can always be used on the left-hand-side of an assignment (if the lvalue is non const)”

      Quote
  22. Vijay Mathew says:

    I once wrote a short blog post on Value Semantics: http://blog.vmathew.in/value-semantics

      Quote
  23. Sean Parent says:

    Reading old articles…

    Minor comment – if you define a function move() as:

    // poor mans 0x move
    template <typename T>
    T move(T& x) {
      T result;
      swap(x, result);
      return result;
    }
    

    Then you can write:

    std::vector<std::string>
    sorted(std::vector<std::string> names)
    {
        std::sort(names);
        return move(names);
    }
    

    Sean

      Quote
  24. Hendrik Schober says:

    So what’s wrong with the following (note the &)?

    std::vector<std::string>& const names = get_names();
      Quote
    • SG says:

      I think you meant

      vector<string> const  names = get_names(); // #1
      vector<string> const& names = get_names(); // #2

      Assuming get_names returns by value (and not a reference) and you’re dealing with a “good” compiler there should not be a difference between the two. In the first case (#1) the compiler can make the function construct its return value directly in that space that will be referred to by the name “names”. In #2 the return value that lives on the stack gets an extension of its life-time and a reference to it is created. A good compiler doesn’t even need to allocate space for the reference but I’m not sure if most compilers are that smart (even though they could be).

      If get_names returns a reference it makes a big difference, of course. Is the following code safe?

      string const& blah = string("123") + "456";

      Currently, it is safe. It’s not safe C++0x (according to the current draft) because operator+(string&&,char const*) returns an rvalue reference –> You get a dangling reference.

      In my opinion, people should not write #2 instead of #1 just because they think they can safe a copy. Many current compilers successfully elide the copy in #1. Also, they should not declare functions that return rvalue references (I really hope operator+(string&&,char const*) and others will be fixed) for “temporary recycling” because it opens up the possibility of dangling references.

      Cheers! SG

        Quote
    • Trick question? It’s a syntax error: you can’t cv-qualify the reference itself.

        Quote
  25. Niels Dekker says:

    Thanks for the article, Dave!

    A very minor remark: your recommended copy-and-swap assignment implementation cannot do a fast assignment to itself. Fast self-assignment could be achieved by adding an extra check to the “canonical” version:

    T& T::operator=(T const& x) {

    // Self-assignment check:
    if (&x == this) return *this;
    T tmp(x);
    swap(*this, tmp);
    return *this;

    }

    Of course, the speed of self-assignment is rarely relevant. But I find it slightly counter-intuitive, having a self-assignment that might fail! But if a user doesn’t even have enough memory to assign something to itself, she’s probably in deep trouble anyway! ;-)

      Quote
    • Niels Dekker: Fast self-assignment could be achieved by adding an extra check to the “canonical” version

      …which would penalize the usual case and complicate the code in order to optimize a rare case, which is almost always a bad idea.

      Niels Dekker: I find it slightly counter-intuitive, having a self-assignment that might fail!

      These self-assignments never have the form x = x anyway (nobody knowingly does a self-assignment except in test suites). They’re almost always x = y cases, where x and y may or may not refer to the same object. That means we’re in code that has to cope with an exception anyhow. There’s really zero advantage in making self-assignment a no-throw operation.

        Quote
      • peterchen says:

        Penalize? You have 3 trivial ops vs. a heap allocation.

        You are right that it increases complexity, and typical scenarios are ‘x=y’.

        I also agree that the test doesn’t help much if your assignment is copy-and-swap. Still I’d put them in by almost habit, since many assignment operator implementations require careful analysis to proof self-assignment is correct.

          Quote
  26. Great article, but could you also include a conclusion with an example of what we should do and not do (i.e. best practices?). Your article seems more like a discussion than a “good rule to follow”, which makes it difficult to pick out the important parts. But definitely thanks for the article. Keep posting more.

      Quote
    • Thanks for the feedback! Speaking generally, I think the popular C++ literature is long on rules and short on insight, and I’m not naturally inclined to boil things down to a prescription, so it’s great to know when the important stuff doesn’t stand out.

      The guideline is: Pass by value any arguments that you would otherwise copy explicitly.

      But it’s just a guideline; as I explain in this comment, you’ll end up generating bigger code if you often pass lvalues that way, and bigger code, in some circumstances can be slower code—or it can just be unacceptable because of its size. So maybe you can see why I don’t dispense a lot of rules.

        Quote
      • Mathias Gaunard says:

        That guideline appears a bit wrong. For example, with that kind of code

        void vector::push_back(const T& t)
        {
            ...
            new(buffer) T(t);
            ...
        }

        It would be silly to pass t by value.

        So I believe it should be something like:

        Pass by value any arguments that you would otherwise copy explicitly, and for which you do not need to control the storage of the copy.

          Quote
      • Mathias Gaunard says:

        Or rather, pass by value any arguments that you would otherwise copy explicitly and which you don’t want to control the storage of the copy of.

        Indeed, for something like vector::push_back, passing by value is useless, even though you want to explicitly copy the argument.

          Quote
  27. Thomas Petit says:

    It seems that your new style copy assignment operator don’t mix well with rvalue reference in MSVC10 and in gcc 4.4 :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    
    struct X
    {
       X(){}
       X&amp; operator=(X x){return *this;}
       X&amp; operator=(X&amp;&amp; x){return *this;}
    };
    X foo()
    {
       X x;
       return x;
    }
    int main()
    {
       X x;
       x = foo();
       return 0;
    }
    At line 15 (gcc error) :
    In function ‘int main()’:|
    error: ambiguous overload for ‘operator=’ in ‘x = foo()’|
    note: candidates are: X& X::operator=(X)|
    note:                 X& X::operator=(X&&)|
    
    I don’t understand. The copy assignment operator takes an lvalue, the move assignment operator take an rvalue and foo returns an rvalue. Where is the ambiguity ? Is this a bug ?
      Quote
    • Hold your horses, people! All (well, much) will be revealed when we cover rvalue refs in the next installment. :-)

        Quote
      • Pavel Matsula says:

        Greetings from Russia! It seems to me that your blog posts “Move it with rvalue references” and “Your next assignment” are somehow broken because I can see neither the article nor the comments. That’s why I can’t get whether one really needs a move constructor at all since an lvalue will get a copy to swap with due to the by-value argument passing and an rvalue will be treated as the argument itself to.. again swap with if the compiler does the copy elision! Am I missing something?

          Quote
        • Pavel Matsula says:

          Oh, sorry, I surely meant there might be no need of a move assignment X& operator=(X&& x); , not a move constructor.

            Quote
  28. Andrzej Krzemienski says:

    Hi, This is a couple of unconnected thoughts that hauted me after reading your article.

    (1) The copy-and-swap idiom. It implements a copy in terms of swap. Then, the default swap function is implemented in terms of three copies. It is all fine if you always implement customized swap for your classes, but if you don’t you are risking infinite recursion. Wouldn’t it be safer if the idiom was always accompained by the note saying that you need to implement swap too?

    (2) In C++0x we will have lval references, rval references, and values (no-references). The Cartesian product with const/non-const qualifier gives 6 possible ways in which we can define function arguments: 1. fun( YourClass v ); // by value 2. fun( const YourClass v ); // const copy. useless?? 3. fun( YourClass & v ); // output parameter 4. fun( YourClass const& v ); // sort of by value, bot no copying 5. fun( YourClass && v ); // temporary that I can change 6. fun( YourClass const&& v ); // useless – #4 would do

    Numbers 2 and 4 are probably useless, but it still leaves us with four. Even if we do not want to talk about rval refs right now it leaves us with three, which is one too much. Having spent a number of years programming in C++ I do not find in strange any more, but if you look at it from a new commer perspective there should be only two: either I will change your object, or not. I think Andrei Alexandrescu pointed that out somewhere in the discussion groups. It could be only #1 (interested in value) and #3 (interested in an object – memory location). The other two (4 and 5) are just for performance tweeks, aren’t they? Well, #5 is also about unique ownership, but half of it’s job is still performance, isn’t it? It is troublesome that you have four choices, and after your article, it is clear that it is not clear which one to choose. We may think taht we optimize, but in fact we inadvertently pessimize. If we have only two options and teh support of copy elision, move semantics, and perhaps something newer and even more powerful, we could just write:

    fun( set<vector> data );

    and be sure that we never add any slowdown. I am not surewhat point I am trying to make here, but Im pretty sure I want to make some point. The two functions below have different argument type. Trying to pick one when matching overload candidates would be ambiguous, but they are still two types. 1. fun( YourClass v ); 2. fun( YourClass const& v ); Why do we need #2? Because it is sometimes faster. Why do we need #1? Not sure. Because it is sometimes faster? Perhaps we do not need #1 at all? If we discard it we can change the syntax of #2 to

    fun( YourClass v );

    This is the same as #1 used to be, but since we discarded it there is no ambiguity. If there is some programmer’s knowlege required to perform optimization, shouldn’t it rather be provided via attributes:

    fun( YourClass [[copy]] v );

    But do we need even that? Is the compiler simply not smarter that us?

    (4) Value semantics. It has a great suppot in C++, e.g. in form of a implicitly defined copy constructor/assignment. A couple of things were suggested to the Standards Committe to make it even better. I just wanted to list them here: 1. Implicitly defined comparison operator (that is a logical conjunction of member-wise comparisons). It was mentioned in N2326, but never really proposed 2. The definition of “the same”. In N2479. It has a status “Outstanding issues” – not sure what it means. 3. Not generating copy operations implicitly for classes with non-trivial destructors. Proposed in N2904. No idea what its status is.

    Regards.

      Quote
    • Andrzej Krzemienski: Hi, This is a couple of unconnected thoughts that hauted me after reading your article.

      Heh, just a couple? ;-)

      This is a whole article unto itself! Thanks for your contribution; I may have to respond in pieces.

      (1)The copy-and-swap idiom. It implements a copy in terms of swap.

      An assignment, I think.

      Then, the default swap function is implemented in terms of three copies.

      A copy and two assignments, but…

      It is all fine if you always implement customized swap for your classes, but if you don’t you are risking infinite recursion. Wouldn’t it be safer if the idiom was always accompained by the note saying that you need to implement swap too?

      …point taken.

      I’ll try to get to the rest of your material soon, but let me say now that a lot of what you’ve written sounds a lot like thoughts I‘ve been having lately. I ask the question this way: what would a language that was designed to support mutable value semantics look like?

        Quote
    • Sebastian Redl says:

      (1) That’s why it’s customary to use the member swap in the copy-and-swap idiom:

      T& operator =(T t) { t.swap(*this); return *this; }

      No chance of infinite recursion, unless you implemented member swap in the canonical way, and that would be stupid.

      (2) Const rvalue references are useless to the programmer and should never appear written in a program. (They can appear through template deduction.) Const by-value arguments are also useless, so as you say, we’re down to 4 variants. Let’s leave rvalue references out for the moment. We deal with by-ref, by-val and by-const-ref. Combining the usual C++ wisdom with this article pretty much leads to these guidelines: - Unconditionally use by-ref is for out or in-out parameters. However, reconsider whether you need out parameters, because returning might be just as efficient. - Unconditionally use by-val for arguments that are cheap to copy (primitives) and by-const-ref for arguments that are not copyable. - This leaves arguments that are copyable, but it is expensive to do so. This article essentially says that you should pass these by-val iff you plan to modify them inside the function, but don’t want the modifications to be visible outside. The downside of this approach is that you leak an implementation detail into the interface: if you have a traditional assignment operator, you should pass the argument by-const-ref, but if you convert it to a copy-and-swap assignment operator, you should change it to by-val.

        Quote
      • Andrzej Krzemienski says:

        Your guidelines are clear and fine. But what I was writng about was more fantasizing how a more perfect language could look like. In fact, one aspect could be achieved only by even more advanced compiler optimization technique. As I have little (i.e. none at all) familiarity with compilers, I may still be really fantasizing, but just consider:

        FatCopy fc = prepare(); read1( fc ); read2( fc );

        Two read functions do not alter the parameter; we are used to declaring:

        void read1( FatCopy const & fc );

        This is in order to avoid copying. This, in turn, is because we are used to think that:

        void read1( FatCopy fc );

        means copying. The way I have been taught C++, one thinks that the above line means passing data by copying, unless copy elision is employed. How about changing our thinking to “passing data by value”: there will be no copying, unless the function really changes value fc and copying is really unavoidable. The compiler will decide to use your copy constructor only if necessary, and in situations where you would need a copy anyways. This could be achieved as follows: The compiler always compiles

        void read1( FatCopy fc );

        as passing by reference, and if it finds that fc is modified by read1, it marks it “somehow” this fact in that function’s meta data, and later, it adds a copy at the call site while linking, so the calling function would be compiled to:

        FatCopy fc = prepare(); FatCopy __copy = fc; read1( __copy ); read2( fc );

        The copy is only if read1 modifies fc. Otherwise there is no copy whatsoever.

        Regards.

          Quote
        • Tony Van Eerd says:

          I would call this a form of “copy on write”, at the compiler level. If you were looking for a name/idiom.

          P.S. sounds like a good idea to me.

            Quote
          • Actually, I’ve been talking/thinking about exactly Andrzej’s idea for about a year now, and calling it “compile-time copy-on-write.” I think it’s time for the article about ideas for the “ideal language in the spirit of C++.”

              Quote
    • Andrzej Krzemienski: The copy-and-swap idiom. It implements a copy in terms of swap. Then, the default swap function is implemented in terms of three copies. It is all fine if you always implement customized swap for your classes, but if you don’t you are risking infinite recursion. Wouldn’t it be safer if the idiom was always accompained by the note saying that you need to implement swap too?

      Maybe, but it’s probably not as scary as you think. Outside namespace std, an unqualified call to swap won’t find std::swap unless you’ve explicitly brought it into scope with a using-declaration.

        Quote
  29. Brad says:

    Dave, et al. Thanks much for this site and the material so far.

      Quote
  30. Maxim Yanchenko says:

    One consideration about the following:

    Consider this cousin of our original sorted(…) function, which takes names by const reference and makes an explicit copy:
    std::vector
    sorted2(std::vector const&amp; names) // names passed by reference
    {
        std::vector r(names);        // and explicitly copied
        std::sort(r);
        return r;
    }

    In principle, the fact that the function makes a copy is an implementation detail, which is invisible if you use const vector&. When you switch to pass-by-value, you are effectively exposing your implementation to the interface of the function, so in case you later come up with another solution that doesn’t involve copying, you’ll switch to reference with the obvious signature change, that at least will require your clients to recompile (fortunately, their code won’t change unless they are using the signature explicitly, say as a function pointer). Without this, they would just replace .so/.dll with the new version.

    It’s not a major problem, just a point to consider. After all, almost no explicit optimization comes at zero price. (Btw, if elimination of reference-bound temporaries was allowed, it would work in this case as well!)

      Quote
    • First, yeah this is just another manifestation of the code size issue explained in this comment.

      However #1, I have a hard time seeing a switch to pass-by-value as exposing an implementation detail. I’m not sure why; it may just be a gut reaction, but my orientation is toward pass-by-value as a default, and pass-by-reference as an optimization. Nothing obliges the function to modify or steal resources from the copied value, and the copy ought to be side-effect-free. Okay, I’ll admit I’m flailing about in the dark hoping to hit the right explanation for why it’s not an implementation detail. Let me give that some thought.

      However #2, I totally disagree that legalizing your optimization—which I won’t even call “elimination of reference-bound temporaries” because you surely would not want that to be allowed in all cases—would help with the recompilation problem. It’s like I said earlier: your optimization depends on being able to see inside the called function, which is in conflict with the separate compilation model.

        Quote
      • Andrzej Krzemienski says:

        Hi, Your “However #1″ is somehow very inspiring. When I type the function:

        int double1( int const& i ) { return 2*i; }

        and then change it to:

        int double2( int i ) { return 2*i; }

        No-one will say taht I exposed any imlementation detail. I just want a value. In fact I would probably never write double1, because double2 is so natural: “just give me this integral value”. double1, on the other hand says: “I will take it by reference”. Imagine a function call:

        return double2(5);

        Who is interested in knowing that you will have yet another name to refer to the 5? I just want to double it. Now, ‘const’ means “I will not try to change it, so pas me literals and temporaries as well”. This is “weird” too. I wasn’t asking whether you would be changing it or not, I just wanted to double the number, but now I have to puzzle about whether someone will mutate my value or not. And in fact “int const&” somehow isn’t as much mutation-proof as “int” alone. You can cast away const, but you cannot cast the copy back onto the original.

        Also the change from double1 to double2 only exposes a copy operation (and the destructor). Not any other function. Copy operation is special to that extent that compilers implement it for you for your own classes. Function:

        void fun( YourClass cc );

        Doesn’t expose any function of YourClass. It simply requires a value. Well, I know it is just some loose thoughts.

        Regards.

          Quote
      • x.martian says:
        I have a hard time seeing a switch to pass-by-value as exposing an implementation detail.I’m not sure why;

        Here is why: If you header file looks like:

        class B; class A {public: void Foo(B& b);};

        The user does not need to know the definition of B. On the other hand, if your header file looks like:

        include “B.h”

        class A {public void Bar(B b);};

        Now the user of your class is forced to know B.

          Quote
  31. Sebastian says:

    It seems you’re going to “blogify” the “RValue Reference 101″ article which is nice because it is currently inaccessible to users who don’t have a trac account at boostpro.

    Cheers! Sebastian

      Quote
  32. Maxim Yanchenko says:

    Hi Dave,

    Could you please comment on this note in the Standard (12.8/15) “when a temporary class object that has not been bound to a reference (12.2) …” Why there is this “has not been bound to a reference” constraint? Consider this (no idea how to get syntax-highlighted code here, a “how to” link would be great):

    struct string
    {
        const char* str;
        string(const char* str) : str(str) { puts("constructor"); }
        string(const string& other) : str(other.str) { puts("copy constructor"); }
    };
    struct holder_1 { string s; holder_1(const string& s) : s(s) {} };
    int main()
    {
        holder_1("some string");
    }
    

    Due to the constraint, holder_1::s can’t be initialized directly from “some string”, and a temporary string will always be created.

    And this constraint appears to be in the latest draft as well.

    Thanks

      Quote
    • Maxim Yanchenko: Consider this (no idea how to get syntax-highlighted code here, a “how to” link would be great):

      Please see the new “posting” tab.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      
      struct string
      {
          const char* str;
          string(const char* str) : str(str) { puts("constructor"); }
          string(const string&amp; other) : str(other.str) { puts("copy constructor"); }
      };
      struct holder_1 { string s; holder_1(const string&amp; s) : s(s) {} };
      int main()
      {
          holder_1("some string");
      }
      Due to the constraint, holder_1::s can’t be initialized directly from “some string”, and a temporary string will always be created.

      I don’t think I agree with your analysis of this example. The type of "some string" is char[12], and it must be converted to a string in order to match the signature of holder_1‘s ctor in line 7 long before holder_1::s is initialized. There’s no opportunity to use the ctor in line 4 as far as I can tell. Am I missing something?

        Quote
      • Maxim Yanchenko says:

        Well, to me, copyctor elision is (conceptually) something like “get rid of creating a temporary if it’s being used only to initialize an object of the same type” (please correct me right here if I’m wrong)? and the wording in the standard is, well, just wording, which is subject to change (e.g. there are two more allowed cases in the latest C++0x draft comparing to 2003). To do the elision, compiler should look ahead anyway? to check the usage of the temporary. E.g. here:

        string s = "some string";
        The sequence of calls should be string(const char*)->string(const string&), but the compiler “looks a bit further and sees” that the copyctor initializes from a temporary, and gets rid of it (of course, making all necessary checks about copyctor availability).

        I don’t see why it can’t be done here as well, as the compiler has all code, everything is inline etc. It’s an optimization, and the compiler should be clever enough to look ahead (and we know they are in many cases, like link-time optimization). But here it’s simply forbidden by the standard because the temporary has been bound to a reference. That’s why I’m asking why do we have this constraint.

          Quote
        • Maxim Yanchenko: It’s an optimization, and the compiler should be clever enough to look ahead (and we know they are in many cases, like link-time optimization). But here it’s simply forbidden by the standard because the temporary has been bound to a

          OK, I understand what you’re asking, though I’m not at all sure that eliminating the text in question would be enough to allow that particular optimization. For what it’s worth, your idea is in a completely different class from today’s copy elision, because the current optimizations only “look a bit further” at the call site, and don’t require looking inside the callee as yours does, which in principle is possible though it may be too late by link time, practically speaking (too much high-level information missing by then).

          I don’t really know why we have that constraint (“see D&E” is my stock answer), but if I had to guess I’d say it was there to prevent lvalues from being mutated in scenarios like

          X produce();
          int consume(X);
           
          X&amp; a = produce();
          int b = consume(a);

          If line 5 mutated a that would be pretty surprising.

            Quote
          • Rodrigo says:

            I read somewhere that it is because the committee agreed (for C++98) that some use-cases exist where monitoring the number of copies is useful. I’m completely opposed to this; the programmer that needs to monitor the number of copies created also needs to know the strange cases where the copy is allowed to be avoided.

              Quote
          • Rodrigo: I read somewhere that it is because the committee agreed (for C++98) that some use-cases exist where monitoring the number of copies is useful.

            I would be very surprised if the particular optimization Maxim was asking for was ever considered, and even more surprised if it were ruled out on those grounds. That seems completely inconsistent with the intent of copy elision.

              Quote
          • Rodrigo says:

            Yes sorry Dave, I misread it.

            However, Alex Stepanov in his notes and his latest book smartly points that if we could force construction, copying and equality to retain their expected semantics, the compiler could apply that optimization.

            The committee, prefering freedom, didn’t enforce any semantics. It means that

            struct s { s(const char*); s(const s&); };

            bool operator==(const s&, const s&);

            s a1 = “123″; s a2 = a1;

            (a1 == a2); // could be false, the programmer could rely on it being false!

            then, we simply aren’t allowed to rewrite s a2 = a1; to s a2 = “123″;

              Quote
          • I think you mean “the programmer couldn’t rely on it being true!” In any case, yes, Elements of Programming and regular types will definitely be topics in upcoming articles.

              Quote
          • Maxim Yanchenko says:
            Rodrigo: s a1 = “123″; s a2 = a1;

            a1 is not a temporary here, so it doesn’t apply. Here is my conceptual understanding for the copy elision:

            “If a temporary object a1 is only used to initialize another object a2 of the same type, a1 can be eliminated and a2 can be directly initialized using a1′s initializer”

            In your example a1 is also used later in comparison (and it’s not a temporary object at all), so it can’t be eliminated.

            Rodrigo, Dave, do you agree with this conceptual definition for the copy elision?

              Quote
          • Rodrigo says:

            In any case yes.

            One thing I hate from C++ is the copy-ellision rules. It would be easier if the problem you point were surely caused by a weak compiler instead of a C++ rule.

            While I’m not completely sure if your optimization is currently allowed in C++, for me it makes sense.

            Btw my point was that a copy from an object could be different from an object created using the same initializer from the source.

              Quote
          • Seth says:

            Well said, Dave! What they probably mean is that they accidentally do URVO in debug mode. I wonder what happens if someone reported THAT as a bug :)

              Quote
          • Maxim Yanchenko says:
            Dave Abrahams:it may be too late by link time, practically speaking (too much high-level information missing by then).

            Well, I mentioned link time just to emphasize the power of present day optimizers. For the code in question, all analysis can be done in compile time. It’s optimization, which is never mandatory, so we should be ready for the cases when it doesn’t work (e.g. it falls to link time).

            Dave Abrahams: If line 5 mutated a that would be pretty surprising.

            It looks like you have meant const X& a = produce();, right? Something strange with the markup.

            Well (all the latter is not strictly according to the standard definitions), a is not a “real temporary” here: the object returned by produce is, but when you bind it to a reference that extends its lifetime, it’s not a temporary anymore in terms of elision, as you explicitly say “I want this object to live beyond the full expression it was created in”. OTOH, it applies to all other references as well (e.g. references in parameters), so probably they should be also subject to elision.

            As long as a is used (and only used) to initialize a temporary (I believe this is the necessary precondition for the elision) in the consume‘s parameter, I see no problem with the a optimized out.

            But this should probably also mean that we don’t rely on the destructor of X (obvious usage is the Guard idiom).

            I need to think more about it, I still don’t have clear picture.

              Quote
          • litb says:

            I think there is another version of that. Consider:

            void f() {
              try {
                prolly_failing();
              } catch(Exception &e) {
                Exception e1(e);
                e1.modify();
                  // oops, e is modified potentially
              }
            }

            Since exception objects are temporaries too, the restriction about reference binding makes the irritating case of above not possible.

              Quote
          • litb says:

            Hi there. A similar example where we would mutate a temporary that has got a name:

            void f() {
              try {
                dangerous();
              } catch(Exception &e) {
                // (seemingly) copy it then (seemingly) modify
                // only the copy
                Exception copy(e);
                copy.enjoy();
                throw;
              }
            }

            We would accidentally modify the exception object. I think the restriction of the reference binding will generally make it so we can’t refer to the temporary object a second time.

              Quote
    • unomadh says:

      Hello Dave, Sweet arcticle, brought me in understanding stratums about so many things, thanks a lot.

      A question now, given this piece of code:

      struct T
      {
          T()                         { cout << "+ " << this << endl; }
          T(T && f)                   { cout << "+ " << this << " move " << &f << endl; }
          T(T const& f)               =delete;//{ cout << "+ " << this << " copy " << &f << endl; }
      };
      T func (T f) {
          return T {};
      }
      int main () {
          T t2 = func(T {});
      }
      

      Output:

      + 0x7fffb4e139a0
      + 0x7fffb4e13980
      

      Meaning no copy due to elision, and even no move. This code compiles (GCC 4.7) event if copy constructor is explicitly deleted, but isn't required in theory ? Is it a standard behaviour ? On the other it doesn't compile with deleted move constructor, even if not used just like copy.

      Thanks

        Quote
      • Marc says:
        Meaning no copy due to elision, and even no move. This code compiles (GCC 4.7) event if copy constructor is explicitly deleted, but isn’t required in theory ? Is it a standard behaviour ? On the other it doesn’t compile with deleted move constructor, even if not used just like copy.

        Note that you can disable elision with -fno-elide-constructors to see the difference. There is no copying in your example, only moves. Where do you expect you might need a copy construction?

          Quote
        • unomadh says:

          Huh that’s right. So the compiler seems to request a copy constructor even with the elision. Is it right ?

            Quote
  33. Joel Falcou says:
    Dave Abrahams It looks vaguely as though you’re trying to show something about const vs. non-const member functions, but that distinction doesn’t make any difference to copy elision, so maybe I’m misunderstanding.

    Well I was trying to have a function like the sort in the article and see what happens when I call it. The memory allocation is here so i have costly things going on when copy ellision is not done.

    Basically, is there a 2-3 classes examples that when compile tells : look here ellision, there no ellision.

      Quote
  34. Joel Falcou says:

    Ok. So I wanted to see if I can “checj” this things work with my current g++. So, I wrote that : http://codepad.org/craBkxXL

    The output is, for gcc 4.3 : non-const call A new : 4 A delete : 6 A copy : 3

    non-const call B new : 4 B delete : 6 B copy : 3

    The non-const call is there to “emulate” the sort function from the article.

    So, does this means the copy-stuff is done or do I don’t call the proper thing and, hence, not trigger the mecanism or is gcc 4.3 not copy-ellision aware (which I doubt) ?

    I think I’m just not doing the correct thing to check this. So how should I butcher this so I can validate the use of copy ellision and show this to unbeliever co-worker ?

      Quote
    • Joel, it’s a little hard to tell what you’re trying to demonstrate with this example. It looks vaguely as though you’re trying to show something about const vs. non-const member functions, but that distinction doesn’t make any difference to copy elision, so maybe I’m misunderstanding. Try cutting it down to the absolute minimum (e.g. remove all that memory allocation stuff) and if you’re testing several different things, separate those tests as well.

        Quote
    • Agnostic says:

      The behavior depends on compiler options and on way how temporary is objained http://ideone.com/QYYZKl

      gcc 4.3 non-const call A new : 5 A delete : 7 A copy : 3

      non-const call B new : 3 B delete : 5 B copy : 1

        Quote
  35. Thomas Petit says:

    Very interesting article !

    Sure, copy elision really mess with our C++ programmer’s “common sense”. It’s quite surprising to find a better way of writing assignment operator after all these years. There is certainly a lot of textbook to correct. :)

    However, exploiting copy elision further than that, especially for return value, seems a bit fragile to me : 1) It’s hard to check if RVOs really take place, unless adding I/O in constructors. 2) Turn on debug mode and all these nice copy elisions disappear. (at least with MSVC)

    “we have all the background we need to attack move semantics, rvalue references, perfect forwarding, and more as we continue this article series. See you soon!”

    Great ! I’m really looking forward to seeing that. There is a lot of resource on move semantic out there, but it’s still very confusing to me. For example, I’ve yet to see a straightforward explanation of what is the correct and efficient way of writing functions like “sorted” in presence of move semantic.

    std::vector//&& ?
    sorted(std::vector/&& ?/ names)
    {
        std::sort(names);
        return /std::move ?/names;
    }
    

      Quote
    • Thomas Petit: However, exploiting copy elision further than that, especially for return value, seems a bit fragile to me : 1) It’s hard to check if RVOs really take place, unless adding I/O in constructors.

      Any side-effect will do; you can increment a counter.

      2) Turn on debug mode and all these nice copy elisions disappear. (at least with MSVC)

      I just did a quick test with the MSVC 2010 beta, and for that version you’re half right (elisions of arguments passed by value still happen in debug mode) update: see below . That’s a bit surprising, actually: a copied return value doesn’t help make debugging much easier (especially when argument copies are still getting elided), is likely to be misleading for those people looking for release-mode performance, could actually make debug-mode performance unacceptable even for testing, and it doesn’t take significantly more compile-time resources to do the elision. Once you implement RVO it seems like more work to leave a branch in the compiler where it’s disabled.

      It is true that copy elision isn’t guaranteed by the standard, so vendors are free not to implement it, or to turn it off depending on compiler options. But is “fragile” really the right word? It’s not as though any vendor who implements copy elision can afford to break it in their next release. Also, given the lack of standard guarantees, I don’t see why you’d be more worried about exploiting return value elisions than argument copy elisions.

        Quote
  36. Hi Dave,

    Thanks for the great article. I however have a slightly deeper question regarding rvalues, move semantics, and copy elision:

    How do you ensure that the object passed by value is definitely copied and the copy is not elided by the compiler? Also, how then do you make sure that you really do have a copied object instead of a move-constructed object as a function parameter?

    void foo(T t); // how to make sure t is copy constructed, instead of move-constructed?

      Quote
    • Dean Michael Berris: How do you ensure that the object passed by value is definitely copied and the copy is not elided by the compiler?

      If you want to be sure to avoid copy elision (why would you want to do that?) then you need to pass an lvalue. Copies of lvalue function arguments are never elided.

      Also, how then do you make sure that you really do have a copied object instead of a move-constructed object as a function parameter? void foo(T t); // how to make sure t is copy constructed, instead of move-constructed?

      Well, we haven’t even touched move construction yet, but as long as you’re bringing it up here, again I wonder why you’d want to do that? The answer is the same: pass an lvalue.

        Quote
  37. Very interesting article. I tend to “const &” by default and this article reminds me that this is not as optimal as I thought.

      Quote
  38. Joel Falcou says:

    I’m intrigued on how we can check, for a given compiler at had, if it’s actually doing copy ellision and generate the proper code when we’re using such idioms. I guess adding I/O in the constructor don’t help as it’ll force the ctor to be called. Do you just check the assmebly output or what ?

    Moreover, does this means that we can completely and forever dump the old pass-by-const-reference & copy operator= ?

      Quote
  39. dvi says:

    Could you elaborate further on this paragraph?

    “First, when you pass parameters by reference and copy in the function body, the copy constructor is called from one central location. However, when you pass parameters by value, the compiler generates calls to the copy constructor at the site of each call where lvalue arguments are passed. If the function will be called from many places and code size or locality are serious considerations for your application, it could have a real effect.”

    I don’t quite follow it. Good article though, thanks!

      Quote
    • @dvi: first, you’re most welcome, and thanks for asking for clarification; it helps to know when I fail to connect.

      So here, f takes its argument by value, and does whatever it does… but it doesn’t copy a because (copy elision aside) it already has a copy of whatever was actually passed.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      
      void f(X a) { … modify(a);}
      void g()
      {
          X b;
          f(b); // call f with an lvalue
      }
      void h()
      {
          X c;
          f(c);  // call f with an lvalue
      }

      In this case, the compiler has to generate calls to X‘s copy constructor in the body of g and h at lines 5 and 10. That’s a total of two calls. Probably not a big deal, but in some embedded applications, for example, there’s limited space available for code.

      Now compare with what happens when f takes its argument by reference and copies it:

      void f(X const&amp; r)
      {
          X a(r);    // explicit copy
          …
          modify(a);}

      Now there’s just one call to X‘s copy constructor, in the body of f. The exact same definitions of g and h still work, but calling f no longer involves copying at the call site.

      Does that help?

        Quote

Leave a Comment (post replies using links below individual comments)